AITopics | motion capture data

Collaborating Authors

motion capture data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0626822954674a06ccd9c234e3f0d572-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 09:32:04 GMT

All neural networks used in this work are fully connected, feed-forward neural networks. First-order NODEs are used for single-cell data, while second NODEs are used for the synthetic example as well as the motion capture data. In the second-order NODEs, the initial velocities are predicted using a neural network with two hidden layers with 20 or 100 neurons depending on the dataset with ELU activation function. The main architecture to infer velocities (or accelerations) also contains two hidden layers of sizes 20 or 100 depending on the size of the input and ELU activation function. As an ODE solver, we use an explicit 5-th order Dormand-Prince solver commonly denoted by dopri5.

artificial intelligence, lassonet, machine learning, (14 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

Neural Information Processing SystemsMar-21-2026, 14:58:44 GMT

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.76)

Add feedback

Reduced-Order Model-Guided Reinforcement Learning for Demonstration-Free Humanoid Locomotion

Liu, Shuai, Lau, Meng Cheng

arXiv.org Artificial IntelligenceSep-24-2025

We introduce Reduced-Order Model-Guided Reinforcement Learning (ROM-GRL), a two-stage reinforcement learning framework for humanoid walking that requires no motion capture data or elaborate reward shaping. In the first stage, a compact 4-DOF (four-degree-of-freedom) reduced-order model (ROM) is trained via Proximal Policy Optimization. This generates energy-efficient gait templates. In the second stage, those dynamically consistent trajectories guide a full-body policy trained with Soft Actor--Critic augmented by an adversarial discriminator, ensuring the student's five-dimensional gait feature distribution matches the ROM's demonstrations. Experiments at 1 meter-per-second and 4 meter-per-second show that ROM-GRL produces stable, symmetric gaits with substantially lower tracking error than a pure-reward baseline. By distilling lightweight ROM guidance into high-dimensional policies, ROM-GRL bridges the gap between reward-only and imitation-based locomotion methods, enabling versatile, naturalistic humanoid behaviors without any human demonstrations.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2509.19023

Genre: Research Report (0.65)

Industry:

Leisure & Entertainment (0.46)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Can Vision Language Models Understand Mimed Actions?

Cho, Hyundong, Lin, Spencer, Srinivasan, Tejas, Saxon, Michael, Kwon, Deuksin, Chavez, Natali T., May, Jonathan

arXiv.org Artificial IntelligenceAug-8-2025

Nonverbal communication (NVC) plays an integral role in human language, but studying NVC in general is challenging because of its broad scope and high variance in interpretation among individuals and cultures. However, mime -- the theatrical technique of suggesting intent using only gesture, expression, and movement -- is a subset of NVC that consists of explicit and embodied actions with much lower human interpretation variance. We argue that a solid understanding of mimed actions is a crucial prerequisite for vision-language models capable of interpreting and commanding more subtle aspects of NVC. Hence, we propose Mime Identification Multimodal Evaluation (MIME), a novel video-based question answering benchmark comprising of 86 mimed actions. Constructed with motion capture data, MIME consists of variations of each action with perturbations applied to the character, background, and viewpoint for evaluating recognition robustness. We find that both open-weight and API-based vision-language models perform significantly worse than humans on MIME, motivating the need for increased research for instilling more robust understanding of human gestures.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.21586

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.94)
Leisure & Entertainment > Sports (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Multimodal Foundation Model for Cross-Modal Retrieval and Activity Recognition Tasks

Matsuishi, Koki, Ukita, Kosuke, Okita, Tsuyoshi

arXiv.org Artificial IntelligenceJun-5-2025

In recent years, the widespread adoption of wearable devices has highlighted the growing importance of behavior analysis using IMU. While applications span diverse fields such as healthcare and robotics, recent studies have increasingly focused on multimodal analysis, in addition to unimodal analysis. Several studies have proposed multimodal foundation models that incorporate first-person video and text data; however, these models still fall short in providing a detailed analysis of full-body human activity. To address this limitation, we propose Activity Understanding and Representations Alignment - Multimodal Foundation Model (AURA-MFM), a foundational model integrating four modalities: third-person video, motion capture, IMU, and text. By incorporating third-person video and motion capture data, the model enables a detailed and multidimensional understanding of human activity, which first-person perspectives alone fail to capture. Additionally, a Transformer-based IMU encoder is employed to enhance the model's overall performance. Experimental evaluations on retrieval and activity recognition tasks demonstrate that our model surpasses existing methods. Notably, in the zero-shot classification for action recognition, our method achieved significantly higher performance, with an F1-score of 0.6226 and an accuracy of 0.7320, whereas the existing method recorded an F1-score of 0.0747 and an accuracy of 0.1961.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.03174

Country: Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.66)
Information Technology (0.48)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

Neural Information Processing SystemsMay-27-2025, 08:57:45 GMT

Enabling humanoid robots to clean rooms has long been a pursued dream within humanoid research communities. However, many tasks require multi-humanoid collaboration, such as carrying large and heavy furniture together. Given the scarcity of motion capture data on multi-humanoid collaboration and the efficiency challenges associated with multi-agent learning, these tasks cannot be straightforwardly addressed using training paradigms designed for single-agent scenarios. In this paper, we introduce Cooperative Human-Object Interaction (CooHOI), a framework designed to tackle the challenge of multi-humanoid object transportation problem through a two-phase learning paradigm: individual skill learning and subsequent policy transfer. First, a single humanoid character learns to interact with objects through imitation learning from human motion priors.

cooperative human-object interaction, learning cooperative human-object interaction, manipulated object dynamic, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.99)

Add feedback

Learning Diverse Natural Behaviors for Enhancing the Agility of Quadrupedal Robots

Fu, Huiqiao, Dong, Haoyu, Xu, Wentao, Zhou, Zhehao, Deng, Guizhou, Tang, Kaiqiang, Dong, Daoyi, Chen, Chunlin

arXiv.org Artificial IntelligenceMay-16-2025

Achieving animal-like agility is a longstanding goal in quadrupedal robotics. While recent studies have successfully demonstrated imitation of specific behaviors, enabling robots to replicate a broader range of natural behaviors in real-world environments remains an open challenge. Here we propose an integrated controller comprising a Basic Behavior Controller (BBC) and a Task-Specific Controller (TSC) which can effectively learn diverse natural quadrupedal behaviors in an enhanced simulator and efficiently transfer them to the real world. Specifically, the BBC is trained using a novel semi-supervised generative adversarial imitation learning algorithm to extract diverse behavioral styles from raw motion capture data of real dogs, enabling smooth behavior transitions by adjusting discrete and continuous latent variable inputs. The TSC, trained via privileged learning with depth images as input, coordinates the BBC to efficiently perform various tasks. Additionally, we employ evolutionary adversarial simulator identification to optimize the simulator, aligning it closely with reality. After training, the robot exhibits diverse natural behaviors, successfully completing the quadrupedal agility challenge at an average speed of 1.1 m/s and achieving a peak speed of 3.2 m/s during hurdling. This work represents a substantial step toward animal-like agility in quadrupedal robots, opening avenues for their deployment in increasingly complex real-world environments.

artificial intelligence, machine learning, robot, (16 more...)

arXiv.org Artificial Intelligence

2505.09979

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.67)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Design and Development of a Locomotion Interface for Virtual Reality Lower-Body Haptic Interaction

He, An-Chi, Park, Jungsoo, Beiter, Benjamin, Kalita, Bhaben, Leonessa, Alexander

arXiv.org Artificial IntelligenceMar-3-2025

This work presents the design, build, control, and preliminary user data of a locomotion interface called ForceBot. It delivers lower-body haptic interaction in virtual reality (VR), enabling users to walk in VR while interacting with various simulated terrains. It utilizes two planar gantries to give each foot two degrees of freedom and passive heel-lifting motion. The design used motion capture data with dynamic simulation for ergonomic human-robot workspace and hardware selection. Its system framework uses open-source robotic software and pairs with a custom-built power delivery system that offers EtherCAT communication with a 1,000 Hz soft real-time computation rate. This system features an admittance controller to regulate physical human-robot interaction (pHRI) alongside a walking algorithm to generate walking motion and simulate virtual terrains. The system's performance is explored through three measurements that evaluate the relationship between user input force and output pHRI motion. Overall, this platform presents a unique approach by utilizing planar gantries to realize VR terrain interaction with an extensive workspace, reasonably compact footprint, and preliminary user data.

interaction, interaction force, platform, (15 more...)

arXiv.org Artificial Intelligence

2503.01271

Country:

North America > United States > Virginia (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (1.00)
Information Technology (0.68)
Energy > Power Industry (0.46)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Humanoid robot learns to Waltz with the grace of … a robot

Researchers from the University of California, San Diego have designed an AI-enabled robot that can perform a Waltz simply by mirroring the moves of its human partner. As far as we can tell, the robot was even able to pull off the ballroom dance without stepping on its partner's toes. To make their dancing robot, the team first designed an AI model trained on human motion capture videos and then integrated it into two bipedal Unitree G1 robots. Using another model, those robots were then able to analyze the motions of humans in front of them and mimic those movements themselves. The result was a humanoid robot able to seamlessly walk, dodge, squat, and dance by copying a human.

artificial intelligence, humanoid robot learn, robot, (4 more...)

Popular Science

Country: North America > United States > California > San Diego County > San Diego (0.26)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.64)

Add feedback

CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

Gao, Jiawei, Wang, Ziqin, Xiao, Zeqi, Wang, Jingbo, Wang, Tai, Cao, Jinkun, Hu, Xiaolin, Liu, Si, Dai, Jifeng, Pang, Jiangmiao

arXiv.org Artificial IntelligenceJun-20-2024

Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges associated with multi-agent learning, these tasks cannot be straightforwardly addressed using training paradigms designed for single-agent scenarios. In this paper, we introduce Cooperative Human-Object Interaction (CooHOI), a novel framework that addresses multi-character objects transporting through a two-phase learning paradigm: individual skill acquisition and subsequent transfer. Initially, a single agent learns to perform tasks using the Adversarial Motion Priors (AMP) framework. Following this, the agent learns to collaborate with others by considering the shared dynamics of the manipulated object during parallel training using Multi-Agent Proximal Policy Optimization (MAPPO). When one agent interacts with the object, resulting in specific object dynamics changes, the other agents learn to respond appropriately, thereby achieving implicit communication and coordination between teammates. Unlike previous approaches that relied on tracking-based methods for multi-character HOI, CooHOI is inherently efficient, does not depend on motion capture data of multi-character interactions, and can be seamlessly extended to include more participants and a wide range of object types.

agent, arxiv preprint arxiv, interaction, (14 more...)

arXiv.org Artificial Intelligence

2406.14558

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback